Skip to content

fix(portal): security hardening from v0.2.0 post-merge audit#101

Merged
voyvodka merged 1 commit into
mainfrom
fix/portal-security-hardening
May 11, 2026
Merged

fix(portal): security hardening from v0.2.0 post-merge audit#101
voyvodka merged 1 commit into
mainfrom
fix/portal-security-hardening

Conversation

@voyvodka
Copy link
Copy Markdown
Owner

Summary

Post-merge audit of the B1 embeddable customer portal surfaced three P0 findings. This PR closes all three in one focused fix.

Findings addressed

  1. Rate limit bypassPortalEndpointsController carried no [EnableRateLimiting] attribute. A leaked or misbehaving portal token could spam mutating routes (notably /test, which fires a real outbound HTTP POST) without sharing the per-tenant token-bucket budget that the public API has always enforced.
    → Controller-level [EnableRateLimiting(\"send-by-appid\")] reuses the existing partition; PortalTokenAuthMiddleware already populates HttpContext.Items[\"AppId\"] upstream of RateLimiter.

  2. JWT handler DoS amplificationPortalTokenAuthMiddleware used a static JwtSecurityTokenHandler with the .NET defaults (MapInboundClaims = true, MaximumTokenSizeInBytes ≈ 250 KiB). A multi-hundred-KB Bearer payload was parsed before being rejected.
    → Per-instance handler with MapInboundClaims = false (we read raw appId / capabilities claim keys, so mapping is pure overhead) and MaximumTokenSizeInBytes = PortalAuthOptions.MaxTokenSizeBytes (default 8 KiB). Portal tokens are typically 0.5-2 KB; 8 KiB leaves comfortable headroom.

  3. PortalLookupCache invalidation raceSet() used GetOrAdd to reuse the per-app CancellationTokenSource. After an InvalidateApplication cancelled+disposed it, the next Set would bind a fresh cache entry to that disposed token (CancellationChangeToken over a disposed source → entry expires immediately, or worse, throws on a tight race).
    Set now atomically AddOrUpdate-swaps the CTS and cancels+disposes the previous one. Both Set and InvalidateApplication route through a shared CancelAndDispose helper that swallows ObjectDisposedException on the concurrent-double-dispose edge.

Tests

  • New regression test Portal_Request_With_Oversized_Token_Returns_401_Without_Parsing pins the 8 KiB cap (sends 16 KiB Bearer payload, expects 401 PORTAL_AUTH_INVALID_TOKEN).
  • Cache-race and rate-limit integration coverage are scoped to the next follow-up PR (full P0 portal test coverage backlog).

Test plan

  • dotnet build WebhookEngine.sln --configuration Release — 0 warnings, 0 errors
  • dotnet test — 253/253 passing (Core 34, API 108, Worker 46, Infrastructure 65)
  • CHANGELOG entry under Unreleased > Security
  • CI green on PR

Follow-ups (tracked, not in this PR)

  • P0 portal test coverage gaps (CORS middleware, cross-tenant DELETE/enable/disable/test, AnyAllowsPortalOriginAsync Testcontainers, PortalLookupCache TTL/invalidate)
  • P1 docs (docs/API.md and docs/ARCHITECTURE.md portal sections)
  • P1 behavior fixes (CORS deny-cache, validator merge, PUT→PATCH, disable preserves origins)

Addresses three P0 findings from the post-merge review of the B1
embeddable portal stack:

1. PortalEndpointsController gains a controller-level [EnableRateLimiting]
   attribute so portal mutating routes share the per-tenant token-bucket
   budget. Previously a leaked token could spam /test (real outbound
   HTTP POST) without limit.

2. PortalTokenAuthMiddleware switches from a static JwtSecurityTokenHandler
   to a per-instance one with MapInboundClaims=false and an
   options-driven MaximumTokenSizeInBytes (PortalAuthOptions.MaxTokenSizeBytes,
   default 8 KiB). Defeats the .NET default 250 KiB DoS amplification
   surface and removes per-request claim mapping overhead.

3. PortalLookupCache.Set now atomically AddOrUpdate-swaps the per-app
   CancellationTokenSource and cancels+disposes the previous one,
   instead of GetOrAdd-reusing it. Closes a race between Set and
   InvalidateApplication that could bind a fresh cache entry to a
   disposed token.

Adds Portal_Request_With_Oversized_Token_Returns_401_Without_Parsing
regression test to pin the size cap.
@voyvodka voyvodka added api API layer and endpoints infrastructure Docker, CI, deployment security Security-related issues labels May 11, 2026
@voyvodka voyvodka enabled auto-merge (squash) May 11, 2026 07:42
@voyvodka voyvodka merged commit 9a372f4 into main May 11, 2026
7 checks passed
@voyvodka voyvodka deleted the fix/portal-security-hardening branch May 11, 2026 07:45
voyvodka added a commit that referenced this pull request May 11, 2026
…it doc-drift) (#103)

Closes the documentation half of the v0.2.0 portal audit. Tur 1 (#101)
was the security fix, Tur 2 (#102) was the test coverage; this PR is
the doc drift the same audit surfaced.

docs/API.md §3.8 — Portal API (Customer-Facing JWT):
- HS256 JWT contract: algorithm pin, signing key per-app, lifetime
  cap, clock skew, token size cap, required + optional claims.
- Capability table: endpoints:read|write|test, attempts:read.
- Per-app CORS rules: no wildcards, https-only, RFC 6454 case-
  insensitive matching, preflight semantics.
- Rate limit: shares send-by-appid partition; cross-tenant lookups
  return 404 (never 403, which would leak existence).
- All 10 portal routes documented with request/response shape.
- 5 dashboard portal-admin routes documented.
- Portal-specific error code table.
- End-to-end probe with jose (Node.js mint) + cURL.

docs/ARCHITECTURE.md §4.3 — Portal Token Authentication:
- Per-application secrets stored on Application (PortalSigningKey,
  AllowedPortalOriginsJson, PortalRotatedAt).
- Pipeline ordering with the three invariants it encodes (ApiKeyAuth
  bypass, PortalToken-before-PortalCors, both-before-RateLimiter).
- PortalLookupCache: TTL, instant local invalidation, atomic CTS swap.
- Cross-tenant isolation via 2-arg GetByIdAsync.
- JWT validator defense-in-depth (HS256 pin, 8 KiB token cap,
  MapInboundClaims=false, lifetime cap, opaque error bodies).
voyvodka added a commit that referenced this pull request May 11, 2026
…alidator merge, PUT→PATCH, disable preserves origins) (#104)

Closes the four P1 behaviour findings from the v0.2.0 portal audit. Tur 1
(#101) shipped the P0 security fixes, Tur 2 (#102) shipped the test
coverage, Tur 3 (#103) shipped the docs; this Tur 4 closes the behaviour
delta.

1. PortalCorsMiddleware deny-cache.
   HandlePreflightAsync now caches both allow and deny outcomes via
   IMemoryCache for the same TTL as the per-app signing-key lookup
   (default 60s). Browsers don't cache rejected preflights, so every
   OPTIONS from a disallowed origin used to re-scan the portal-enabled
   app set + deserialize the JSON allowlist — a free DB hammer vector
   for any caller that knew the portal prefix. Cache key is
   lowercased-origin scoped (RFC 6454 §4 case-insensitive). New test
   Preflight_Deny_Decision_Is_Cached_Within_Ttl pins the behaviour by
   mutating the DB to allow the origin after a 403 and asserting the
   second call still 403s within the TTL window.

2. EndpointValidationRules helper consolidation.
   New extension methods (EndpointUrlSyntax, EndpointDescription,
   EndpointTransformExpression, EndpointCustomHeaders,
   EndpointAllowedIpsCidrs, EndpointSecretOverride) become the single
   source of truth for the field-shape rules shared between the 4
   admin endpoint validators and the 2 portal endpoint validators.
   Without this consolidation, tightening a rule on one surface
   silently leaves the other surface weaker — exactly the drift
   pattern the audit flagged. Async DNS host-safety check stays in
   each validator (DependentRules + CustomAsync needs the full
   property selector). Behaviour unchanged.

3. PortalEndpointsController.Update → [HttpPatch].
   The action's body semantics were always partial-replace (every
   field optional, only non-null fields applied) — that's PATCH, not
   PUT. Switching the verb aligns the wire surface with reality and
   stops misleading REST consumers that expect PUT to be full-replace.
   Route, request shape, response shape unchanged. Two existing tests
   ported from PutAsJsonAsync to PatchAsync + JsonContent.Create.

4. DashboardPortalController.Disable preserves origins.
   Removed the line that nulled AllowedPortalOriginsJson on disable.
   Disable now revokes only the auth surface (PortalSigningKey,
   PortalRotatedAt) and keeps the operator-curated CORS allowlist so
   re-enable doesn't force re-curation. Explicit clear path remains:
   PUT /portal/origins with {origins: []}. Renamed test
   Disable_Clears_SigningKey_And_Origins → Disable_Clears_SigningKey_But_Preserves_Origins
   with assertion flip.
voyvodka added a commit that referenced this pull request May 11, 2026
…e TTL) (#105)

Locks in two v0.2.0 audit decisions that were marked 'worth recording'
in the reviewer agent's punch list. Both are pure documentation —
behaviour was already shipped in PRs #101 and #104.

ADR-004 — Portal Signing Key Storage:
- Plaintext varchar(64) at rest, mirroring the existing SigningSecret
  column. No application-level encryption; relies on operator's
  Postgres deployment posture.
- Instant invalidation on rotate / disable. No grace window for
  overlapping old + new key validity.
- One-shot reveal on enable / rotate; subsequent reads return only
  portalEnabled bool + rotated-at + origins (never the key).
- Documents alternatives (pgcrypto, rotation grace, external KMS)
  and revisit triggers (compliance regimes, host-integration
  friction, key reuse beyond JWT verification).

ADR-005 — Portal CORS Preflight Deny-Cache TTL:
- 60s default (LookupCacheTtlSeconds reuse); cache both allow and
  deny outcomes; key is lowercased-origin scoped.
- No synchronous invalidation hook from PUT /portal/origins:
  cache key is origin-scoped, operator action is app-scoped —
  bookkeeping cost outweighs the <=60s staleness.
- Documents alternatives (per-app invalidation, shorter/longer TTL,
  negative-only cache) and revisit triggers (operator UX complaints,
  memory pressure via origin enumeration).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api API layer and endpoints infrastructure Docker, CI, deployment security Security-related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant